AutoML: a High-level Interface for Developers

This is a return on cross-generational investments

Blaise Pascal:

Historically, most applications in science, some in industry:

  1. Radius of the earth
  2. Orbits of planets
  3. Census statistics
  4. Gambling
  5. Insurance
  6. Navigation
  7. Agriculture
  8. Assembly lines

ML is making people to reconsider what is software and how it is created

Software 1.0: (deterministic)

  1. inherently deterministic
  2. errors unacceptable
  3. little data, or hard to collect data
  4. ability to guess the complete algorithm and iterate on the design with some metric in mind
  5. guessed algorithm is underperforming and practically infeasible

Software 2.0: (probabilistic)

  1. inherently probabilistic
  2. errors are acceptable
  3. easy to collect, or large amounts are available
  4. inability to imagine the steps or
  5. high performance requires exponentially increasing number of steps covering a variety of corner cases

Don’t need to be a data scientist to make useful predictive models

Interfaces to ML:

  1. APIs with canned models (especially text, images, sounds)
  2. AutoML (google, h2o)
  3. Robust open-source libraries (xgboost)
  4. Modeling and inference frameworks (tensorflow)

AutoML is an efficient, flexible interface to custom ML

AutoML is intended to:

  1. Open ML to non-experts
  2. Make sure a decent model is built and deployed

There are no universal algorithms, but some are very useful

  1. Tabular data dominates in terms of usefulness and sheer variety
  2. Regression/classification models take advantage of it

Application Input Output
Spam filtering Text message Pass/block
Online advertising Ad, user info Click/Skip
Email routing Email text Support line
Wait time Queue features Minutes to wait
Employee scheduling Time, store, role Number of employees
Cybersecurity Computer behavior Compromised/normal

Competition for AutoML marketing is heating up:

  1. AWS sagemaker
  2. Google AutoML
  3. Microsoft ML studio
  4. DataRobot
  5. H2O
  6. Aible

GCP has a simple, practical interface

Examples:

  1. Text message spam probabilities and classification
  2. Real-state sales price prediction

AutoML is no silver bullet

Positives:

  1. workflow
  2. data splits
  3. good model implementations
  4. infra management
  5. path to deployment and value

Negatives:

  1. model limitations
  2. irrelevant differences, compute waste
  3. if it is broken, hard to debug and iterate in the environment

AutoML will be an important factor in supporting the exponential spread of ML across society

Slava Nikitin